---
title: Tutorial Part 1 - Connect to Data
description: Part 1 - If you're ready to get started with Meltano and connect to data, you've come to the right place!
layout: doc
sidebar_position: 3
---

import Termy from '@site/src/utils/Termy'

Let’s learn by example.

Throughout this tutorial, we’ll walk you through one of many applications of Meltano. All of them are about:
- connecting to data
- processing it
- and finally storing it in some permanent storage or pushing it to another component

In this part, we're going to start with connecting to data.

We're going to take data from a "source", namely GitHub, and extract a list of commits to one repository.

To test that this part works, we will dump the data into JSON files.
In Part 2, we will then place this data into a PostgreSQL database.

We'll assume you have [Meltano installed](/getting-started/installation) already. You can tell Meltano is installed and which version by running `meltano --version`

<Termy>

```console
$ meltano --version
meltano, version 3.8.0
```

</Termy>

<br />

This tutorial is written using meltano >= v3.0.0.

If you don't have a GitHub account to follow along, you could either exchange the commands for a different tap, like GitLab or PostgreSQL, or you can create a free GitHub account. You will also need a [personal access token to your GitHub account](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).

:::tip
<p>If you're having trouble throughout this tutorial, you can always head over to the <a href="https://meltano.com/slack">Slack channel</a> to get help.</p>
:::

## Create Your Meltano Project

Step 1 is to create a new [Meltano project](/concepts/project) that (among other things)
will hold the [plugins](/concepts/plugins) that implement the details of our pipeline.

1. Navigate to the directory that you'd like to hold your Meltano projects.

2. Initialize a new project in a directory of your choosing using [`meltano init`](/reference/command-line-interface#init):

```bash
meltano init my-meltano-project
```

<br />

<Termy>

```console
$ meltano init my-new-project
Created my-new-project
Creating project files...
  my-new-project/
  |-- .meltano
  |-- meltano.yml
  |-- README.md
  |-- requirements.txt
  |-- output/.gitignore
  |-- .gitignore
  |-- extract/.gitkeep
  |-- load/.gitkeep
  |-- transform/.gitkeep
  |-- analyze/.gitkeep
  |-- notebook/.gitkeep
  |-- orchestrate/.gitkeep
Creating system database...  Done!
... Project my-new-project has been created!

Meltano Environments initialized with dev, staging, and prod.
To learn more about Environments visit: https://docs.meltano.com/concepts/environments

Next steps:
  cd my-new-project
  Visit https://docs.meltano.com/getting-started#create-your-meltano-project to learn where to go from here.
```

</Termy>

<br />

This action will create a new directory with, among other things, your [`meltano.yml` project file](/concepts/project#meltano-yml-project-file). Your file will look something like this:

```yml
default_environment: dev
project_id: <unique-GUID>
environments:
- name: dev
- name: staging
- name: prod
```

1. Navigate to the newly created project directory:

```bash
cd my-meltano-project
```

## Add an Extractor to Pull Data from a Source

Now that you have your very own Meltano project, it's time to add [plugins](/concepts/plugins) to it. We're going to add an extrator for GitHub to get our data. An [extractor](/concepts/plugins#extractors) is responsible for pulling data out of any data source. In this case, we choose a specific one with the `--variant` option to make this tutorial easy to work with.

1.  Add the GitHub extractor

```bash
# Simplified syntax - plugin type is automatically detected
meltano add tap-github  # Automatically detected as extractor

# Explicit plugin type for disambiguation:
# meltano add --plugin-type extractor tap-github

# Deprecated positional syntax:
# meltano add extractor tap-github
```

<br />

<Termy>

```console
$ meltano add tap-github
Added extractor 'tap-github' to your Meltano project
Variant:        meltanolabs (default)
Repository:     https://github.com/meltanolabs/tap-github
Documentation:  https://hub.meltano.com/extractors/tap-github--meltanolabs

2024-01-01T00:25:40.604941Z [info     ] Installing extractor 'tap-github'
2024-01-01T00:25:53.152127Z [info     ] Installed extractor 'tap-github'

To learn more about extractor 'tap-github', visit https://hub.meltano.com/extractors/tap-github--meltanolabs
```

</Termy>

<br />

This will add the new plugin to your [`meltano.yml` project file](/concepts/project#plugins):

```yml
plugins:
extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-github.git
```

1.  Test that the installation was successful by calling [`meltano invoke`](/reference/command-line-interface#invoke):

```bash
$ meltano invoke tap-github --help
```

<br />

If you see the extractor's help message printed, the plugin was definitely installed successfully.

<Termy>

```console
$ meltano invoke tap-github --help
2024-09-19T09:32:05.162591Z [info     ] Environment 'dev' is active
Usage: tap-github [OPTIONS]

  Execute the Singer tap.

Options:
  --state PATH              Use a bookmarks file for incremental replication.
  --catalog PATH            Use a Singer catalog file with the tap.
  --test TEXT               Use --test to sync a single record for each
                            stream. Use --test=schema to test schema output
                            without syncing records.
  --discover                Run the tap in discovery mode.
  --config TEXT             Configuration file location or 'ENV' to use
                            environment variables.
  --format [json|markdown]  Specify output style for --about
  --about                   Display package metadata and settings.
  --version                 Display the package version.
  --help                    Show this message and exit.
```

</Termy>

### Configure the Extractor

The GitHub tap requires [configuration](/guide/configuration) before it can start extracting data.

1. The simplest way to configure a new plugin in Meltano is using the mode `interactive`:

```bash
meltano config tap-github set --interactive
```

2. Follow the prompts to step through all available settings, the ones you'll need to fill out are repositories (formatted like `["meltano/meltano"]`), start_date, and your auth_token.

<br />

<Termy>

```console
$ meltano config tap-github set --interactive
Configuring Extractor 'tap-github' Interactively
[...]
Settings
 1. additional_auth_tokens:                                                       [...]
 2. auth_token: GitHub token to authenticate ...
 [...]
 8. repositories: An array of strings containing the github repos to be ...
 [...]
 11. start_date:
 [...]
To learn more about extractor 'tap-github' and its settings, visit https://hub.meltano.com/extractors/tap-github--meltanolabs

Loop through all settings (all), select a setting by number (1 - 16), or exit (e)? [all]:
$ 2
[...]Description:
GitHub token to authenticate with.
New value (redacted):
$
Repeat for confirmation:
$
<[... other 2 values...]
```

</Termy>

<br />

This will add the configuration to your [`meltano.yml` project file](/concepts/project#plugin-configuration):

```yml
  plugins:
    extractors:
      - name: tap-github
        config:
          start_date: '2024-01-01'
          repositories:
          - meltano/meltano
```

It will also add your secret auth token to the file `.env`:

```yml
TAP_GITHUB_AUTH_TOKEN='ghp_XXX' # your token!
```

3. Double check the config by running [`meltano config tap-github`](/reference/command-line-interface#config):

```bash
meltano config tap-github list
```

<br />

<Termy>

```console
$ meltano config tap-github list
2024-07-08T16:27:36.433823Z [info     ] The default environment 'dev' will be ignored for `meltano config`. To configure a specific environment, please use the option `--environment=<environment name>`.

additional_auth_tokens [env: TAP_GITHUB_ADDITIONAL_AUTH_TOKENS] current value: None (default)
        Additional Auth Tokens: List of GitHub tokens to authenticate with. Streams will loop through them when hitting rate limits.
auth_token [env: TAP_GITHUB_AUTH_TOKEN] current value: (redacted) (from the TAP_GITHUB_AUTH_TOKEN variable in `.env`)
        Auth Token: GitHub token to authenticate with.
...
repositories [env: TAP_GITHUB_REPOSITORIES] current value: ['meltano/meltnao', 'meltano/hub', 'meltano/sdk'] (from `meltano.yml`)
        Repositories: An array of strings containing the github repos to be included
...
start_date [env: TAP_GITHUB_START_DATE] current value: '2024-01-01' (from `meltano.yml`)
```

</Termy>

### Select Entities and Attributes to Extract

Now that the extractor has been configured, it'll know where and how to find your data, but won't yet know which specific entities and attributes (tables and columns) you're interested in.

By default, Meltano will instruct extractors to extract all supported entities and attributes, but we're going to [select specific ones for this tutorial](/guide/integration#selecting-entities-and-attributes-for-extraction).

1. Find out what entities and attributes are available, using [`meltano select YOUR_TAP --list --all`](/reference/command-line-interface#select):

```bash
meltano select tap-github --list --all
```

<br />

<Termy>

```console
$ meltano select tap-github --list
2023-05-17T20:07:24.085827Z [info     ] The default environment 'dev' will be ignored for 'meltano select'. To configure a specific environment, please use the option '--environment=environment name'.

Legend:
        selected
        excluded
        automatic

Enabled patterns:
        *.*

Selected attributes:
        [selected ] assignees._sdc_repository
        [automatic]
        [...]
        [selected ] commits.comments_url
        [selected ] commits.commit
        [selected ] commits.commit.author
        [...]
        [selected ] teams.repositories_url
        [selected ] teams.slug
        [selected ] teams.url
```

</Termy>

1. Select the entities and attributes for extraction using [`meltano select`](/reference/command-line-interface#select):

```bash
meltano select tap-github commits url
meltano select tap-github commits sha
meltano select tap-github commits commit_timestamp
```

<br />

This will add the [selection rules](/concepts/plugins#select-extra) to your [`meltano.yml` project file](/concepts/project#plugin-configuration):

```yml
default_environment: dev
environments:
- name: dev
- name: staging
- name: prod
project_id: YOUR_ID
plugins:
  extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-github.git
    config:
      start_date: '2024-01-01'
      repositories:
      - sbalnojan/meltano-lightdash
    select:
    - commits.url
    - commits.sha
    - commits.commit_timestamp

```

1. Run [`meltano select --list`](/reference/command-line-interface#select) to double-check your selection:

```bash
meltano select tap-github --list
```

## Add a dummy loader to dump the data into JSON

To test that the extraction process works, we add a JSON target.

1. Add the JSON target using `meltano add target-jsonl` (plugin type is automatically detected).

   <br />

<Termy>

```console
$ meltano add target-jsonl</span>
Added loader 'target-jsonl' to your Meltano project
Variant:        andyh1203 (default)
Repository:     https://github.com/andyh1203/target-jsonl
Documentation:  https://hub.meltano.com/loaders/target-jsonl--andyh1203

2024-01-01T00:25:40.604941Z [info     ] Installing loader 'target-jsonl'
2024-01-01T00:25:53.152127Z [info     ] Installed loader 'target-jsonl'

To learn more about loader 'target-jsonl', visit https://hub.meltano.com/loaders/target-jsonl--andyh1203
```

</Termy>

<br />

This target requires zero configuration, it just outputs the data into a `jsonl` file.

## Do a test run to verify the extraction process works

Now that [your Meltano project](#create-your-meltano-project), [extractor](#add-an-extractor-to-pull-data-from-a-source), and [dummy loader](#add-a-loader-to-send-data-to-a-destination) are set up, we can test run the extraction process.

There's just one step here: run your newly added extractor and jsonl loader in a pipeline using [`meltano run`](/reference/command-line-interface#run):

```bash
meltano run tap-github target-jsonl
```

<br />

<Termy>

```console
$ meltano run tap-github target-jsonl
2024-09-19T13:53:36.403099Z [info     ] Environment 'dev' is active
2024-09-19T13:53:41.071885Z [warning  ] No state was found, complete import.
2024-09-19T13:53:43.054384Z [info     ] INFO Starting sync of repository: sbalnojan/meltano-lightdash cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github
2024-09-19T13:53:43.553171Z [info     ] INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.4796161651611328, "tags": {"endpoint": "commits", "http_status_code": 200, "status": "succeeded"}} cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github
2024-09-19T13:53:43.561190Z [info     ] INFO METRIC: {"type": "counter", "metric": "record_count", "value": 1, "tags": {"endpoint": "commits"}} cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github</span>
2024-09-19T13:53:43.735250Z [info     ] Incremental state has been updated at 2024-09-19 13:53:43.734535.
2024-09-19T13:53:43.820467Z [info     ] Block run completed            block_type=ExtractLoadBlocks err=None set_number=0 success=True</span>
```

</Termy>

<br />

You should see data flowing from your source into the jsonl file.
You can verify that it worked by looking inside the newly created file called `output/commits.jsonl`.

<Termy>

```console
$ cat output/commits.jsonl
{"sha": "409bdd601e0531833665f538bccecd0f69e101c0", "node_id": "C_kwDOH_twHNoAKDQwOWJkZDYwMWUwNTMxODMzNjY1ZjUzOGJjY2VjZDBmNjllMTAxYzA", "url": "https://api.github.com/repos/sbalnojan/meltano-lightdash/commits/409bdd601e0531833665f538bccecd0f69e101c0", "commit_timestamp": "2024-09-14T12:41:21Z"}
```

</Termy>

## Next Steps

Next, head over to [Part 2: Loading extracted data into a target](/getting-started/part2).
